Skip to content

fix(bedrock): enable websearch_interception with extended thinking on Bedrock#20489

Closed
Quentin-M wants to merge 9 commits intoBerriAI:mainfrom
Quentin-M:search_tools_fix
Closed

fix(bedrock): enable websearch_interception with extended thinking on Bedrock#20489
Quentin-M wants to merge 9 commits intoBerriAI:mainfrom
Quentin-M:search_tools_fix

Conversation

@Quentin-M
Copy link
Contributor

@Quentin-M Quentin-M commented Feb 5, 2026

Summary

Rebased on BerriAI/litellm main (Feb 18, 2026) with the following fixes on top of cherry-picked PR #20488:

Websearch Interception

  • Cherry-pick updated PR Fix websearch interception with extended thinking mode support #20488 — thinking block preservation through websearch agentic loop
  • Load api_key/api_base from router's search_tools config (fixes "TAVILY_API_KEY is not set")
  • Auto-adjust max_tokens when <= thinking.budget_tokens (Anthropic requires max_tokens > budget_tokens)

Bedrock

  • Centralized beta header filtering with version-based support (replaces inconsistent per-API filtering)
  • Fix version extraction regex in beta headers config
  • Strip context_management from request body for all Bedrock APIs (Invoke Messages, Invoke Chat, Converse)

Thinking

  • Drop thinking param when assistant messages have text without thinking blocks
  • Recognize adaptive thinking type in is_thinking_enabled (Opus 4.6)

Test Plan

  • All websearch interception tests pass (51 passed)
  • All bedrock beta header tests pass (69 passed)
  • All thinking tests pass (8 passed)
  • Ruff linting passes
  • Docker image build + deploy

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Feb 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 25, 2026 0:26am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

This PR adds two enhancements to the websearch interception handler: extracting API keys from router configuration and handling thinking parameter constraints for Anthropic models.

Key Changes:

  • Extracts api_key and api_base from router's search_tools config and passes them to litellm.asearch()
  • Adjusts max_tokens when it's less than or equal to budget_tokens by setting it to budget_tokens + DEFAULT_MAX_TOKENS (4096)
  • Conditionally drops the thinking parameter when assistant messages with tool_calls have no thinking_blocks

Critical Issue Found:

  • Lines 440 and 443 pass anthropic_messages (the imported module) instead of follow_up_messages (the messages list) to the helper functions. This will cause the thinking parameter logic to fail completely as it's analyzing the wrong data type.

Confidence Score: 1/5

  • This PR contains a critical bug that will cause runtime failures
  • The wrong variable (imported module instead of messages list) is passed to helper functions, causing the thinking parameter logic to fail. API key extraction looks correct but cannot be safely merged until the critical bug is fixed.
  • litellm/integrations/websearch_interception/handler.py requires immediate attention - lines 439-444 must be fixed before merge

Important Files Changed

Filename Overview
litellm/integrations/websearch_interception/handler.py Added API key handling from router config and thinking parameter logic, but critical bug: wrong variable passed to helper functions (module instead of messages list)

Sequence Diagram

sequenceDiagram
    participant Client
    participant WebSearchHandler
    participant Router
    participant LLM
    participant SearchProvider

    Client->>WebSearchHandler: Request with websearch tool
    WebSearchHandler->>WebSearchHandler: Pre-request hook: convert native tools
    WebSearchHandler->>LLM: Initial request
    LLM-->>WebSearchHandler: Response with tool_use blocks
    WebSearchHandler->>WebSearchHandler: Detect websearch tool_use
    
    Note over WebSearchHandler,Router: Extract API keys from router config
    WebSearchHandler->>Router: Get search_tools config
    Router-->>WebSearchHandler: search_provider, api_key, api_base
    
    loop For each search query
        WebSearchHandler->>SearchProvider: Execute search (with api_key)
        SearchProvider-->>WebSearchHandler: Search results
    end
    
    Note over WebSearchHandler: Check thinking parameter
    alt thinking.budget_tokens > max_tokens
        WebSearchHandler->>WebSearchHandler: Adjust max_tokens = budget_tokens + 4096
    end
    
    alt Last tool_call message has no thinking_blocks
        WebSearchHandler->>WebSearchHandler: Drop thinking parameter
    end
    
    WebSearchHandler->>LLM: Follow-up request with search results
    LLM-->>WebSearchHandler: Final response
    WebSearchHandler-->>Client: Return final response
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@jquinter
Copy link
Contributor

jquinter commented Feb 5, 2026

Hey @Quentin-M, nice PR — both fixes address real production issues and the code is clean. A couple of things to address before merge:

1. Missing litellm.modify_params guard

The Anthropic and Bedrock transformations you're following both gate the thinking drop behind litellm.modify_params:

# litellm/llms/anthropic/chat/transformation.py:1114
if litellm.modify_params:
    optional_params.pop("thinking", None)

Your code drops thinking unconditionally. Users who have modify_params disabled would get unexpected behavior. The fix is straightforward — wrap the drop:

if should_drop_thinking:
    if litellm.modify_params:
        params_to_exclude.append('thinking')
        verbose_logger.warning(
            "WebSearchInterception: Dropping 'thinking' param because the last assistant message "
            "with tool_calls has no thinking_blocks. The model won't use extended thinking for this turn."
        )

2. Tests required

The project requires at least 1 test in tests/litellm/. Both the API key extraction logic and the thinking parameter handling are testable with mocks — e.g. mock llm_router.search_tools to verify api_key/api_base are extracted, and mock anthropic_messages_optional_request_params with various thinking/max_tokens combos.


Side note: Greptile flagged a "critical bug" claiming anthropic_messages (the module) was passed instead of follow_up_messages to the helper functions. That's incorrect — your code correctly passes follow_up_messages. Just wanted to flag that so you can disregard it.

@ghost
Copy link

ghost commented Feb 6, 2026

@greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

  • Enables Bedrock + Claude extended/adaptive thinking to work with agentic tools by preserving/validating thinking blocks across the websearch interception loop.
  • Centralizes Bedrock anthropic-beta whitelisting/translation with model version/family gating and uses it across Invoke Chat, Invoke Messages, and Converse.
  • Strips context_management from Bedrock request bodies (header-only feature) and tweaks Router to expose resolved provider to callbacks.
  • Adds Opus 4.6 adaptive thinking mappings and test coverage for thinking-block detection + beta header filtering.

Confidence Score: 2/5

  • This PR has one clear merge-blocking correctness issue in Bedrock model ID constants; remaining changes look reasonable.
  • Most changes are additive with tests (thinking block detection, beta header filtering, websearch interception follow-up), but the Opus 4.6 Bedrock Converse model ID in constants drops the ':0' suffix which will break routing/matching for that model until corrected.
  • litellm/constants.py (BEDROCK_CONVERSE_MODELS Opus 4.6 entry)

Important Files Changed

Filename Overview
litellm/constants.py Updates BEDROCK_CONVERSE_MODELS; Opus 4.6 entry drops ':0' suffix which breaks matching against other Bedrock IDs.
litellm/integrations/websearch_interception/handler.py Fixes websearch interception loop to preserve kwargs, thinking blocks, and to load search tool credentials from router config.
litellm/integrations/websearch_interception/transformation.py Returns structured TransformRequestResult including tool_calls and thinking blocks; prepends thinking blocks to follow-up assistant message.
litellm/litellm_core_utils/core_helpers.py Extends internal param filtering to handle prefixes and centralizes internal key lists.
litellm/llms/anthropic/chat/transformation.py Adds Opus 4.6 adaptive thinking mapping and drops thinking when last assistant message lacks thinking blocks.
litellm/llms/bedrock/beta_headers_config.py Introduces centralized whitelist/translation for Bedrock anthropic-beta headers with version/family gating.
litellm/llms/bedrock/chat/converse_transformation.py Uses centralized beta filter; strips unsupported context_management body param; adds Opus 4.6 adaptive thinking path.
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py Filters/translates beta headers via centralized filter and strips context_management from body for Invoke Messages.
litellm/router.py Stores resolved custom_llm_provider into deployment params to make provider visible to callbacks post-alias resolution.
litellm/utils.py Adds helper to detect missing thinking blocks in last assistant message; minor BedrockModelInfo import refactor.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router as litellm.Router
    participant WSI as WebSearchInterceptionLogger
    participant Provider as Bedrock/Anthropic
    participant Search as litellm.asearch

    Client->>Router: completion(model alias, messages, tools, thinking)
    Router->>WSI: pre_api_call(kwargs incl. resolved custom_llm_provider)
    WSI->>WSI: convert hosted web_search tool -> regular tool
    WSI-->>Router: return {**kwargs, tools: converted}
    Router->>Provider: initial LLM request
    Provider-->>Router: response(content blocks)
    Router->>WSI: async_post_call_success(response, kwargs)
    WSI->>WSI: transform_request() extracts tool_use + thinking blocks
    alt has websearch tool_use
        WSI->>Search: asearch(query, provider, api_key/api_base from router.search_tools)
        Search-->>WSI: search result(s)
        WSI->>WSI: transform_response() builds assistant msg (thinking + tool_use) + user tool_result
        WSI->>Provider: follow-up LLM request(max_tokens adjusted if <= thinking budget)
        Provider-->>WSI: final response
    else no websearch
        WSI-->>Router: passthrough
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@jquinter
Copy link
Contributor

jquinter commented Feb 6, 2026

Review Findings

Thanks for the comprehensive work on enabling websearch + extended thinking on Bedrock!

CI Status

Check Status Notes
Lint FAIL PLR0915: Too many statements (51 > 50) in handler.py:333
Tests FAIL Multiple beta header tests failing
CLA ✅ Pass
Vercel ✅ Pass

Must Fix

1. Lint failure: Function too long

integrations/websearch_interception/handler.py:333:15: PLR0915 Too many statements (51 > 50)

The _execute_agentic_loop function now has 51 statements, exceeding Ruff's limit. Consider extracting some logic into helper functions (e.g., _validate_max_tokens_for_thinking()).

2. Multiple test failures in beta headers code

Several new tests are failing:

  • test_context_management_requires_claude_4_5
  • test_backward_compatibility_existing_headers
  • test_converse_anthropic_model_gets_anthropic_beta
  • test_advanced_tool_use_header_translation_for_opus_4_5
  • test_converse_filters_unsupported_headers
  • And 10+ more...

The beta header filtering logic appears to have bugs in version/model matching that need investigation.

3. Missing litellm.modify_params guard

The existing Anthropic and Bedrock transformations gate thinking drop behind litellm.modify_params:

# Anthropic transformation.py:1114
if litellm.modify_params:
    optional_params.pop("thinking", None)

The PR drops thinking unconditionally without checking this flag.

Positive Aspects

  1. Centralized beta header filtering (beta_headers_config.py) is a good architectural improvement
  2. TransformRequestResult NamedTuple makes the transform_request contract explicit
  3. Comprehensive test coverage - 2,400+ lines of new tests
  4. Good documentation - New README.md in litellm/llms/bedrock/
  5. Real user-facing bugs fixed - Bedrock/Claude with websearch + thinking

Suggestion

Consider waiting for upstream PRs (#20488, #20514, #20519) to merge first, then rebasing. This would reduce the PR size and avoid potential merge conflicts with poetry.lock.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 18, 2026

Greptile Summary

This PR implements three main feature areas: (1) websearch interception with extended thinking support on Bedrock, (2) centralized beta header filtering for all Bedrock APIs, and (3) improved thinking param handling for assistant messages without thinking blocks.

Key concerns:

  • Removes native structured outputs for Bedrock Converse_supports_native_structured_outputs(), _create_output_config_for_response_format(), _add_additional_properties_to_schema(), and outputConfig handling were all deleted. This removes production functionality for Claude 4.5+, Qwen3, Mistral, and DeepSeek models. Multiple existing tests (at least 5) will break.
  • Renames _is_nova_2_model to _is_nova_lite_2_model — narrows matching to exclude Nova-2-Pro, which is a listed model with reasoning support. Existing tests call the old method name and will fail with AttributeError. Nova-2-Pro will no longer receive correct reasoningConfig parameters.
  • Removes Sonnet 4.6 from interleaved thinking and tool search support — patterns for Sonnet 4.6 were stripped from _supports_interleaved_thinking_on_bedrock() and _supports_tool_search_on_bedrock().

What works well:

  • The centralized BedrockBetaHeaderFilter in beta_headers_config.py is well-designed with version-based filtering, family restrictions, and header translation
  • Thinking block preservation through the websearch agentic loop is thoroughly implemented and tested
  • Loading api_key/api_base from router's search_tools config fixes the "TAVILY_API_KEY is not set" issue
  • New tests are comprehensive and mock-based (no real network calls)
  • The max_tokens auto-adjustment when <= thinking.budget_tokens follows existing patterns

Confidence Score: 2/5

  • This PR introduces regressions by removing native structured outputs and Nova-2-Pro reasoning support, which will break existing tests and functionality.
  • While the websearch interception and beta header centralization changes are well-implemented and well-tested, the converse_transformation.py changes silently remove production features (native structured outputs, Nova-2-Pro reasoning) that have existing test coverage. These removals will cause test failures and functional regressions for users relying on those features.
  • litellm/llms/bedrock/chat/converse_transformation.py requires immediate attention — it removes native structured outputs and Nova-2-Pro reasoning support. litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py needs verification of Sonnet 4.6 pattern removals.

Important Files Changed

Filename Overview
litellm/constants.py Removes :0 suffix from Claude Opus 4.6 model ID in BEDROCK_CONVERSE_MODELS. Minor change, already discussed in prior review thread.
litellm/integrations/websearch_interception/handler.py Major changes: thinking block preservation through websearch agentic loop, API key/base loading from router search_tools config, max_tokens auto-adjustment for thinking budget. The max_tokens adjustment only checks type == "enabled" but not "adaptive" (minor inconsistency).
litellm/integrations/websearch_interception/transformation.py Refactored to use NamedTuple return types (TransformedRequest, TransformedResponse). Added thinking block capture and preservation. Clean, well-tested changes.
litellm/litellm_core_utils/core_helpers.py Extracted internal params to module-level constants (INTERNAL_PARAMS, INTERNAL_PARAMS_PREFIXES) with prefix-based filtering. Added _is_param_internal() helper. Mostly formatting changes alongside the refactor.
litellm/llms/anthropic/chat/transformation.py Adds last_assistant_message_has_no_thinking_blocks check alongside existing tool_calls check to drop thinking param when assistant messages have text but no thinking blocks.
litellm/llms/base_llm/chat/transformation.py Adds "adaptive" thinking type recognition in is_thinking_enabled(), supporting Opus 4.6. Small, targeted change.
litellm/llms/bedrock/beta_headers_config.py New centralized module for Bedrock beta header filtering with version-based model support, family restrictions, and header translations. Well-documented and extensible design.
litellm/llms/bedrock/chat/converse_transformation.py CRITICAL: Removes native structured outputs support (_supports_native_structured_outputs, _create_output_config_for_response_format, _add_additional_properties_to_schema, outputConfig handling) and renames _is_nova_2_model to _is_nova_lite_2_model (excluding Nova-2-Pro). These removals break existing tests and remove production features.
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py Integrates centralized beta header filter with translation support, strips context_management, removes Sonnet 4.6 patterns from interleaved thinking support. Simplifies tool search beta header logic.
litellm/router.py Stores custom_llm_provider in deployment's litellm_params after alias resolution, enabling callbacks (websearch_interception) to access the resolved provider.
litellm/utils.py Adds _message_has_thinking_blocks() helper and last_assistant_message_has_no_thinking_blocks() function. Refactors existing any_assistant_message_has_thinking_blocks to use shared helper. Well-tested changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Request with web_search tool] --> B{WebSearchInterceptionLogger\nasync_log_pre_api_call}
    B -->|Provider not enabled| C[Pass through to LLM]
    B -->|Provider enabled| D[Convert web_search to\nLiteLLM standard tool]
    D --> E[LLM Response]
    E --> F{async_should_run_agentic_loop}
    F -->|No WebSearch tool_use| G[Return response]
    F -->|WebSearch tool_use detected| H[Extract tool_calls +\nthinking_blocks]
    H --> I[_execute_agentic_loop]
    I --> J[Load search credentials\nfrom router search_tools]
    J --> K[Execute parallel searches\nvia litellm.asearch]
    K --> L[Build follow-up messages\nwith thinking + tool_result]
    L --> M{max_tokens <= budget_tokens?}
    M -->|Yes| N[Adjust max_tokens =\nbudget + DEFAULT_MAX_TOKENS]
    M -->|No| O[Keep original max_tokens]
    N --> P[anthropic_messages.acreate\nfollow-up request]
    O --> P
    P --> Q[Return final response]

    subgraph Beta Header Filtering
        R[anthropic-beta headers] --> S{BedrockBetaHeaderFilter}
        S --> T[Whitelist check]
        T --> U[Version-based filtering]
        U --> V[Family restrictions]
        V --> W[Header translation\ne.g. advanced-tool-use]
        W --> X[Filtered headers to AWS]
    end
Loading

Last reviewed commit: ab743af

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 18, 2026

Additional Comments (2)

litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py
Sonnet 4.6 removed from interleaved thinking support

The patterns for sonnet-4.6, sonnet_4.6, sonnet-4-6, sonnet_4_6 were removed from _supports_interleaved_thinking_on_bedrock(). If Sonnet 4.6 models exist in deployment, they will no longer get interleaved thinking beta headers auto-injected. Was this intentional? Was removing Sonnet 4.6 from interleaved thinking support intentional? If Sonnet 4.6 does support interleaved thinking on Bedrock, these patterns should be restored.


litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py
Sonnet 4.6 also removed from tool search support

Similarly to the interleaved thinking removal above, Sonnet 4.6 patterns were removed from _supports_tool_search_on_bedrock(). And in _get_tool_search_beta_header_for_bedrock() (line 289), the check was simplified to just "opus-4" in model.lower(), which means Sonnet 4.5 would also no longer get the tool-search-tool-2025-10-19 header via this code path.

However, the centralized BedrockBetaHeaderFilter handles tool-search family restrictions to allow both opus and sonnet families at 4.5+, so this may be partially mitigated by the beta header filter's own logic. Worth verifying this doesn't cause a regression for Sonnet 4.5 tool search on the Messages API. Can you confirm that the centralized beta header filter properly handles tool-search injection for Sonnet 4.5+ on the Messages API, given that _get_tool_search_beta_header_for_bedrock now only injects it for Opus 4 models?

mpcusack-altos and others added 3 commits February 18, 2026 22:20
When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing validation errors. This change ensures thinking blocks from the original response are preserved and included at the start of follow-up assistant messages.

- Created `TransformRequestResult` NamedTuple to capture both tool_calls and thinking_blocks from `transform_request()`, making the contract explicit and extensible
- Modified `transform_request()` to extract and return thinking/redacted_thinking blocks alongside tool calls
- Updated `transform_response()` to accept thinking_blocks and prepend them to follow-up assistant messages
- Passed thinking_blocks through the agentic loop chain: detection → execution → message transformation
- Fixed `transform_request()` to return full kwargs (not just tools) to preserve other request parameters
- Used `filter_internal_params()` utility instead of manual filtering for consistency

This change fixes websearch interception when extended thinking mode is enabled.

**Problem**: When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing the error: `messages.1.content.0.type: Expected 'thinking' or 'redacted_thinking', but found 'tool_use'`

**Solution**: Modified `transform_request()` to capture thinking/redacted_thinking blocks from the original response, and `transform_response()` to include them at the start of the assistant message in follow-up requests.

**Testing**: Successfully tested end-to-end with Claude Code → LiteLLM Proxy → AWS Bedrock → Claude Opus 4.5.

```yaml
model_list:
  - model_name: claude-opus-4-5-20251101
    litellm_params:
      model: bedrock/us.anthropic.claude-opus-4-5-20251101-v1:0
      aws_region_name: us-west-2
    model_info:
      supports_web_search: true
litellm_settings:
  callbacks: ["websearch_interception"]
  websearch_interception_params:
    enabled_providers: ["bedrock"]
    search_tool_name: "searxng-search"
search_tools:
  - search_tool_name: searxng-search
    litellm_params:
      search_provider: searxng
      api_base: "https://searxng.example.com"
```

**Note**: Uses `bedrock/` (not `bedrock/converse/`) to route through `anthropic_messages_handler()` which supports agentic hooks.
Fixes issue where websearch interception failed with "TAVILY_API_KEY is not set"
error when using search providers that require API keys.

Changes:
- Extract api_key and api_base from router search_tools configuration
- Pass credentials to litellm.asearch() when available
- Falls back to environment variables when credentials not in config
- Maintains backward compatibility with existing configurations

Root cause:
Handler was only extracting search_provider from router config, but not the
associated api_key and api_base fields. This caused litellm.asearch() to fall
back to environment variables, which failed when keys weren't set in env.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes websearch interception failures when thinking.budget_tokens is set
and requests violate Anthropic's requirement: max_tokens > budget_tokens.

Changes:
- Validate max_tokens against thinking.budget_tokens when extended thinking is enabled
- Automatically adjust max_tokens to budget_tokens + DEFAULT_MAX_TOKENS (4096) when insufficient
- Follows the same pattern as base transformation classes in LiteLLM

This prevents the error: "max_tokens must be greater than thinking.budget_tokens"
when using extended thinking with websearch interception.

Related issue: BerriAI#14194

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Quentin-M and others added 4 commits February 18, 2026 23:35
…pport

Standardize anthropic-beta header handling across all Bedrock APIs
(Invoke Chat, Converse, Messages) using a centralized whitelist-based
filter with version-based model support.

- Inconsistent filtering: Invoke Chat used whitelist (safe),
  Converse/Messages used blacklist (allows unsupported headers through)
- Production risk: unsupported headers could cause AWS API errors
- Maintenance burden: adding new Claude models required updating
  multiple hardcoded lists

- Centralized BedrockBetaHeaderFilter with whitelist approach
- Version-based filtering (e.g., "requires 4.5+") instead of model lists
- Family restrictions (opus/sonnet/haiku) when needed
- Automatic header translation for backward compatibility

- Add `litellm/llms/bedrock/beta_headers_config.py`
  - BedrockBetaHeaderFilter class
  - Whitelist of 11 supported beta headers
  - Version/family restriction logic
  - Debug logging support

- Invoke Chat: Replace local whitelist with centralized filter
- Converse: Remove blacklist (30 lines), use whitelist filter
- Messages: Remove complex filter (55 lines), preserve translation

- Add `tests/test_litellm/llms/bedrock/test_beta_headers_config.py`
  - 40+ unit tests for filter logic
- Extend `tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py`
  - 13 integration tests for API transformations
  - Verify filtering, version restrictions, translations

- Add `litellm/llms/bedrock/README.md`
  - Maintenance guide for adding new headers/models
- Enhanced module docstrings with examples

- Production safety: only whitelisted headers reach AWS
- Zero maintenance for new Claude models (Opus 5, Sonnet 5, etc.)
- Consistent filtering across all 3 APIs
- Preserved backward compatibility (advanced-tool-use translation)

```bash
poetry run pytest tests/test_litellm/llms/bedrock/ -v
```

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ock APIs

Bedrock doesn't support context_management as a request body parameter.
The feature is enabled via the anthropic-beta header (context-management-2025-06-27)
which was already handled correctly. Leaving context_management in the body causes:
"context_management: Extra inputs are not permitted"

Strip the parameter from all 3 Bedrock API paths:
- Invoke Messages API
- Invoke Chat API
- Converse API (additionalModelRequestFields)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…without thinking blocks

Follow-up to a494503f4b which fixed thinking + tool_use. That fix only
detected missing thinking blocks on assistant messages with tool_calls.
When the last assistant message has plain text content (no tool_calls),
the check returned False and thinking was not dropped, causing:
"Expected thinking or redacted_thinking, but found text"

Add last_assistant_message_has_no_thinking_blocks() to detect any
assistant message with content but no thinking blocks. Extract shared
_message_has_thinking_blocks() helper that checks both the
thinking_blocks field and content array for thinking/redacted_thinking
blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upstream only checks for type="enabled" but Opus 4.6 uses type="adaptive".
Without this fix, max_tokens auto-adjustment doesn't trigger for adaptive
thinking, causing API errors.
@CLAassistant
Copy link

CLAassistant commented Feb 25, 2026

CLA assistant check
All committers have signed the CLA.

@Quentin-M
Copy link
Contributor Author

@ryangoldblatt-bm to sign the CLAs 🙏

@ryangoldblatt-bm
Copy link

@ryangoldblatt-bm to sign the CLAs 🙏

Done sir, shall we reopen this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants